Bayesian Inference

This week we're going to explore Markov Chain Monte-Carlo (commonly referred to as MCMC) methods, which are techniques for sampling based on random walks. In other words, MCMC is a class of techniques for sampling from a probability distribution and can be used to estimate the distribution of parameters given a set of observations.

After this session you will

Keywords: random walks — bayesian inference — markov chain monte-carlo: sampling, metropolis algorithm

Markov Chain Monte-Carlo: The Metropolis Algorithm

The Metropolis-Hastings algorithm is a technique to sample from high dimensional, difficult to sample distributions or functions. It is often used for cases where direct sampling is not feasible due to intractable integrals. It's a MCMC algorithm - Markov Chain because the next sample $x'$ only relies on the current sample $x$, and Monte Carlo because it generates a random sample which can be used to e.g., compute integrals.

The key ingredient in this technique lies in the distribution $g(x \rightarrow x')$, which is used to compute the next candidate of the Markov Chain from a given state. From this, the acceptance probability $\alpha$ is calculated. It is the probability that is used to decide whether the new sample is accepted or not, and is given by $\alpha = min(1, \frac{f(x')}{f(x)} \frac{g(x' \rightarrow x)}{g(x \rightarrow x')})$, where $f(x)$ is a function that is proportional to the target distribution $P(x)$ we want to sample from. If the transition distribution is symmetric such that $g(x \rightarrow x') = g(x' \rightarrow x)$, then the algorithm is just called Metroplis, and the second factor in the equation for $\alpha$ is unity. This is true, for example, if $g$ is Gaussian.

alt text

Exercise 1:

The steps of the Metropolis algorithm are:

1. Pick a starting point $(x, y)$.  
2. Calculate $f(x, y)$.  
3. Draw a proposal parameter $(x', y')$ from a symmetric distribution.  
4. Calculate $f(x', y')$.  
5. Accept the suggested parameter with probability $\alpha = min(1, \frac{f(x', y')}{f(x, y)})$  
6. If accepted, set $(x, y) = (x', y')$ and repeat from 3 for the desired number of iterations.  

Intermezzo: pymc3 - a library for mcmc-sampling

In your daily life as a data scientist, you might not want to implement mcmc-sampling yourself. Luckily, the pymc3 module has all the functionalities we might want to perform mcmc. What follows is the general usage of the module presented with an example of fitting a straight line using parameter estimates coming from an mcmc-algorithm.

In the lecture Bayes' theorem was introduced. We can write it in a slightly shorter version as

$$p(\theta|x, y) \propto p(\theta)\cdot p(x, y|\theta)$$

i.e. the posterior distribution of parameters $\theta$ is proportional to the prior and the likelihood of our data sample. We can use this formalism to solve a linear regression problems. Let us assume our fitting parameters are $\theta = (q, s, \sigma)$ where $q$ is the intercept, $s$ the slope and $\sigma$ the intrinsic scatter of $y$. Furthermore let us assume that our observed $y$ is normally distributed around the true regression relation. The likelihood of our datasample is then easily computed (see lecture notes).
The prior function encapsulates all information we believe to have about the fitting parameters before having seen any data! E.g. these can be realistic ranges within which we expect our parameters to be.

What we want is to sample from the posterior distribution (which in real world examples can be very complicated!). Once we have a good idea how this distribution looks for each parameter we can compute the expected parameter fitting values (mean) as well as the error (standard deviation) on them. Let's see how we can do this in code.

Exercise 2:

In this exercise we will apply the methods introduced above to fit a regression line where the parameters are estimated using mcmc-sampling. Furthermore, we will compare the linear fits obtained from the mcmc-sampling to the ones from ordinary linear regression.